Goto

Collaborating Authors

 model splitting



Accelerating Wireless Distributed Learning via Hybrid Split and Federated Learning Optimization

arXiv.org Artificial Intelligence

Federated learning (FL) and split learning (SL) are two effective distributed learning paradigms in wireless networks, enabling collaborative model training across mobile devices without sharing raw data. While FL supports low-latency parallel training, it may converge to less accurate model. In contrast, SL achieves higher accuracy through sequential training but suffers from increased delay. To leverage the advantages of both, hybrid split and federated learning (HSFL) allows some devices to operate in FL mode and others in SL mode. This paper aims to accelerate HSFL by addressing three key questions: 1) How does learning mode selection affect overall learning performance? 2) How does it interact with batch size? 3) How can these hyperparameters be jointly optimized alongside communication and computational resources to reduce overall learning delay? We first analyze convergence, revealing the interplay between learning mode and batch size. Next, we formulate a delay minimization problem and propose a two-stage solution: a block coordinate descent method for a relaxed problem to obtain a locally optimal solution, followed by a rounding algorithm to recover integer batch sizes with near-optimal performance. Experimental results demonstrate that our approach significantly accelerates convergence to the target accuracy compared to existing methods.


Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting

arXiv.org Artificial Intelligence

Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks the deployment of FL in real-world scenarios. Thus, a framework that can effectively break the memory wall while jointly taking into account the hardware and statistical heterogeneity in FL is urgently required. In this paper, we propose SmartSplit, a framework that effectively reduces the memory footprint on the device side while guaranteeing the training progress and model accuracy for heterogeneous FL through model splitting.Towards this end, SmartSplit employs a hierarchical structure to adaptively guide the overall training process. In each training round, the central manager, hosted on the server, dynamically selects the participating devices and sets the cutting layer by jointly considering the memory budget, training capacity, and data distribution of each device. The MEC manager, deployed within the edge server, proceeds to split the local model and perform training of the server-side portion. Meanwhile, it fine-tunes the splitting points based on the time-evolving statistical importance. The on-device manager, embedded inside each mobile device, continuously monitors the local training status while employing cost-aware checkpointing to match the runtime dynamic memory budget. Extensive experiments on representative datasets are conducted on both commercial off-the-shelf mobile device testbeds. The experimental results show that SmartSplit excels in FL training on highly memory-constrained mobile SoCs, offering up to a 94% peak latency reduction and 100-fold memory savings. It enhances accuracy performance by 1.49%-57.18% and adaptively adjusts to dynamic memory budgets through cost-aware recomputation.


Collaborative Inference via Dynamic Composition of Tiny AI Accelerators on MCUs

arXiv.org Artificial Intelligence

The advent of tiny AI accelerators opens opportunities for deep neural network deployment at the extreme edge, offering reduced latency, lower power cost, and improved privacy in on-device ML inference. Despite these advancements, challenges persist due to inherent limitations of these accelerators, such as restricted onboard memory and single-device focus. This paper introduces Synergy, a system that dynamically composes tiny AI accelerators for multi-tenant models, effectively addressing tinyML's critical challenges for the increasing demand for on-device AI. A key feature of Synergy is its virtual computing space, providing a unified, virtualized view of resources and enabling efficient task mapping to physical devices. Synergy's runtime orchestration module ensures optimal inference across dynamic and heterogeneous accelerators. Our evaluations with 7 baselines and 8 models demonstrate that Synergy improves throughput by an average of 8.0X compared to baselines.


Accelerating Split Federated Learning over Wireless Communication Networks

arXiv.org Artificial Intelligence

The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.


Predictive GAN-powered Multi-Objective Optimization for Hybrid Federated Split Learning

arXiv.org Artificial Intelligence

As an edge intelligence algorithm for multi-device collaborative training, federated learning (FL) can reduce the communication burden but increase the computing load of wireless devices. In contrast, split learning (SL) can reduce the computing load of devices by using model splitting and assignment, but increase the communication burden to transmit intermediate results. In this paper, to exploit the advantages of FL and SL, we propose a hybrid federated split learning (HFSL) framework in wireless networks, which combines the multi-worker parallel update of FL and flexible splitting of SL. To reduce the computational idleness in model splitting, we design a parallel computing scheme for model splitting without label sharing, and theoretically analyze the influence of the delayed gradient caused by the scheme on the convergence speed. Aiming to obtain the trade-off between the training time and energy consumption, we optimize the splitting decision, the bandwidth and computing resource allocation. The optimization problem is multi-objective, and we thus propose a predictive generative adversarial network (GAN)-powered multi-objective optimization algorithm to obtain the Pareto front of the problem. Experimental results show that the proposed algorithm outperforms others in finding Pareto optimal solutions, and the solutions of the proposed HFSL dominate the solution of FL.


JMSNAS: Joint Model Split and Neural Architecture Search for Learning over Mobile Edge Networks

arXiv.org Artificial Intelligence

The main challenge to deploy deep neural network (DNN) over a mobile edge network is how to split the DNN model so as to match the network architecture as well as all the nodes' computation and communication capacity. This essentially involves two highly coupled procedures: model generating and model splitting. In this paper, a joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network. Considering both the computing and communication resource constraints, a computational graph search problem is formulated to find the multi-split points of the DNN model, and then the model is trained to meet some accuracy requirements. Moreover, the trade-off between model accuracy and completion latency is achieved through the proper design of the objective function. The experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.


Distributed Training on AWS SageMaker

#artificialintelligence

In today's world, when we have access to humongous data, deeper and bigger deep learning models, training on a single GPU on a local machine can pretty soon become a bottleneck. Some models won't even fit on a single GPU and even if they do the training could be painfully slow. Running a single experiment could take weeks and months in such a setting i.e. large training data and model. As a result, it can hamper research and development and increase the time taken for making POCs. However, to our relief cloud compute is available which allows one to set up remote machines and configure them as per the requirements of the project.


Distributed solving through model splitting

arXiv.org Artificial Intelligence

Constraint problems can be trivially solved in parallel by exploring different branches of the search tree concurrently. Previous approaches have focused on implementing this functionality in the solver, more or less transparently to the user. We propose a new approach, which modifies the constraint model of the problem. An existing model is split into new models with added constraints that partition the search space. Optionally, additional constraints are imposed that rule out the search already done. The advantages of our approach are that it can be implemented easily, computations can be stopped and restarted, moved to different machines and indeed solved on machines which are not able to communicate with each other at all.